Component sizes in networks with arbitrary degree distributions 
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We give an exact solution for the complete distribution of component sizes in random networks 
with arbitrary degree distributions. The solution tells us the probability that a randomly chosen 
node belongs to a component of size s, for any s. We apply our results to networks with the three 
most commonly studied degree distributions — Poisson, exponential, and power-law — as well as to 
the calculation of cluster sizes for bond percolation on networks, which correspond to the sizes of 
outbreaks of SIR epidemic processes on the same networks. For the particular case of the power- 
law degree distribution, we show that the component size distribution itself follows a power law 
everywhere below the phase transition at which a giant component forms, but takes an exponential 
form when a giant component is present. 
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There has in recent years been considerable interest 
within the physics community in the properties of net- 
works [E 0, Si ■ Methods from physics, and particularly 
from statistical physics, have proved invaluable for un- 
derstanding the structure and behavior of networked sys- 
tems such as the Internet, the world wide web, metabolic 
networks, protein interaction networks, and social net- 
works of interactions between people. In particular, by 
creating simple (and sometimes not-so-simple) models of 
network structure and formation, researchers have gained 
insight about the way networks behave as a function of 
the basic parameters governing their topology. 

One of the most fundamental parameters of a network 
is its degree distribution. The degree of a node or vertex 
in a network is the number of edges connected to that 
vertex, and the frequency distribution of the degrees of 
vertices has been shown to have a profound influence on 
almost every aspect of network structure and function, 
including path lengths, clustering, robustness, centrality 
indices, spreading processes, and many others. Various 
network models have been used to illuminate the effects 
of the degree distribution, but perhaps the most widely 
studied, and certainly one of the simplest, is the so-called 
configuration model. 

In the configuration model only the degrees of vertices 
are specified and nothing else; except for the constraint 
imposed by the degrees, connections between vertices are 
random. Equivalently, configuration model networks can 
be thought of as networks drawn uniformly at random 
from the set of all possible networks whose vertices have 
the specified degrees. One of the primary attractions of 
the configuration model is that many of its properties 
can be calculated exactly in the limit of large system size 
and for this reason it has become one of the fundamen- 
tal tools for the quantitative understanding and study of 
networks. In 1995 MoUoy and Reed [1] gave an exact cri- 
terion for the existence of a giant component in the model 
and later also gave an expression for the expected size of 
that component Newman, Strogatz, and Watts 0| 
gave additional expressions for a variety of other proper- 
ties including number of vertices a given distance from a 
randomly chosen vertex, average path length in the giant 
component, and critical exponents near the transition at 



which the giant component appears, as well as general- 
izations of the model to bipartite and directed networks, 
and many further results have been presented since by a 
variety of authors. 

One fundamental result that has been missing, how- 
ever, is an expression for the sizes of components in the 
model other than the giant component. More specifically, 
if we choose a vertex at random from the network, what 
is the probability that it belongs to a component of a 
given size? As well as being a central structural prop- 
erty of the network, this distribution is directly related 
to important practical issues such as the distribution of 
the sizes of disease outbreaks for diseases spreading over 
contact networks 0, Q ■ 

At first sight, calculation of the component sizes ap- 
pears difficult. One can derive equations that must be 
satisfied by the generating function for the distribution 
of component sizes |6j, but usually these equations can- 
not be solved. Here we show, however, that it is nonethe- 
less possible to derive an explicit expression for the com- 
plete distribution of component sizes in the configuration 
model for general degree distribution. In particular, we 
show that it is possible to derive closed-form expressions 
for component sizes for the three most commonly stud- 
ied degree distributions, the Poisson, exponential, and 
power-law distributions. We also show that the same 
techniques can be used to calculate the sizes of percola- 
tion clusters for percolation models on networks of arbi- 
trary degree distribution, a development of some interest 
because of the close connection between percolation and 
epidemic processes. We explore this connection in the 
last part of the paper. 

Let pk be the degree distribution of our network, 
i.e., the probability that a randomly chosen vertex has 
degree k. If rather than a vertex we choose an edge and 
follow it to the vertex at one of its ends, then the number 
of other edges emerging from that vertex follows a differ- 
ent distribution, the so-called excess degree distribution: 
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as shown in, for example, Ref. 0. Here (fc) 
the average degree in the network. 
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It will be convenient to introduce the probability gener- 
ating functions for the two distributions and q^, thus: 
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Many of our results are more easily expressed in terms of 
these generating functions than directly in terms of the 
degree distributions. It will also be convenient to note 
that 



(fc)=.9o(l), .91 = 
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where we have made use of Eq. ([TJ in the second equality. 

Now let us consider the distribution of the sizes of com- 
ponents in our network. Every vertex belongs to a com- 
ponent of size at least one (the vertex itself) and every 
edge connected to the vertex adds at least one more ver- 
tex to the component, and possibly many, if there are 
lots of other vertices that are reachable via that edge. 
Let us denote by t the total number of vertices reachable 
via a particular edge, let the probability distribution of t 
be pt, and let the generating function for this distribution 
be hi{z) = J2tPtz*- 

The probability that a vertex of degree k belongs to a 
component of size s is the probability that the numbers 
of vertices reachable along each of its k edges sum to 
s — 1. This probability, which we will denote P{s\k), is 
given by 
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where S{i,j) is the Kronecker delta symbol. Then the 
probability tTj of a randomly chosen vertex belonging to 
a component of size s is tt^ = J2'k'=o PkP{s\k) and the 
corresponding generating function is 
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But the final sum is simply the generating function go{z), 
Eq. (O, evaluated at hi{z), and hence 



ha{z) = zgo{hi{z)). 
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By a similar argument the generating function hi{z) can 
be shown to satisfy 



hi{z) = zgi{hi{z)). 
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Between them, Eqs. ^ and ([7]) allow us, in princi- 
ple, to calculate the entire distribution of cluster sizes in 
our network given the degree distribution pk- Unfortu- 
nately, the self-consistent relation for hi{z), Eq. ([7]), is 
in most cases not solvable and hence we cannot calcu- 
late the value of the generating function. Surprisingly, 
however, we can still calculate the probabilities tt^. 

Since every component is of size at least 1, the gener- 
ating function ho{z) for the component sizes is of leading 
order z (or higher) and hence contains an overall factor 
of z. Dividing out this factor and differentiating, we can 
write the probability of belonging to a cluster of size s as 
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Using Eq. this can also be written 
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This expression can be rewritten using Cauchy's for- 
mula for the nth derivative of a function, 



d^ 

dz" 



Z = Zq 



nl 
27ri 



fiz) 



[z - Zq) 



n+1 



dz, 



(10) 



where the integral is around a contour that encloses zq 
in the complex plane but encloses no poles in f{z). Ap- 
plying this formula to Eq. ^ with zg = we get 
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where we have used Eq. ^ to eliminate g^ in favor of gi . 
In (jllap we choose the contour to be an infinitesimal loop 
around the origin and, since hi{z) goes to zero as z — s- 0, 
the contour in (jllbp is then also an infinitesimal loop 
around the origin. 

Now regarding z as a function of hi, rather than the 
other way around, we make use of ([7]) to eliminate z and 
write 
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Applying (fTO|) again we then find that 
(fc) 
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(An alternative and equivalent way to derive this 
formula — although a less transparent one — would be to 
rearrange Eq. ([7]) to give z as a function of hi and then 
apply the Lagrange inversion theorem Q to derive the 
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Taylor expansion of hi or ho- Indeed, Eqs. (|8]) to p3|) 
are essentially a proof of a special case of the inversion 
theorem, as applied to the problem in hand.) 

The only exception to Eq. is for the case s = 1, for 
which Eq. (|lip gives 0/0 and is therefore clearly incorrect. 
However, since the only way to belong to a component 
of size 1 is to have no connections to any other vertices, 
the probability tti is trivially equal to the probability of 
having degree zero: 
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Between them, Eqs. (1131) and pA^ give the entire dis- 
tribution of component sizes in terms of the degree dis- 
tribution. They tell us explicitly the probability that a 
randomly chosen vertex belongs to a component of any 
given size s. For any specific choice of degree distribution, 
the application of Eq. (fT3|) still requires us to perform the 
derivatives. Any finite number of derivatives can always 
be carried out exactly to give expressions for tTs to finite 
order. It is also possible in some cases to find a general 
formula for any derivative and so derive a closed-form 
expression for tt^ for general s. In particular, it turns out 
to be possible, as we now show, to find such closed-form 
expressions for the three distributions most commonly 
studied in the literature, the Poisson, exponential, and 
power-law distributions. 

A network in which edges are placed between vertices 
uniformly at random has a Poisson degree distribution 
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where c is the distribution mean. Such networks have 
been studied widely for some decades, most famo usly 
by Erdos and Renyi in the 1950s and 1960s [13, IHI. 
Given Eq. (fT51) . it is straightforward to show that g[){z) = 
gi{z) = e'^'^"-^^ and the derivatives in Eq. (fT3)) can be 
performed to give 
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(The same expression also works for the special case s = 
1.) This expression for the component size distribution of 
the Poisson random graph has been derived in the past 
by a number of other methods — see for instance [l^ — 
but it is a useful check on our methods to see it appear 
here as a special case of the more general formulation. 

Few real- world networks, however, have Poisson de- 
gree distributions. Most have highly right-skewed dis- 
tributions in which most vertices have low degree and a 
small number of "hubs" have higher degree. A number of 
networks, for example, are observed to have exponential 
degree distributions or distributions with an exponen- 
tial tail. Examples include food webs, power grids, and 
some social networks 13, 14]. Consider the exponential 
distribution = Ce~^^ ^ where C is the appropriate nor- 
malizing constant. The generating functions in this case 
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Again the derivatives are straightforward to carry out 
and we find that 
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Applying Stirling's approximation for large s we can 
show that this distribution behaves asymptotically as 
Tis ~ se"^", where pL = 21n[|(l - e"^)] - A. Thus the 
component size distribution approximately follows an ex- 
ponential law itself, although with an extra leading factor 
of s and a different exponential constant. 

However, perhaps the greatest amount of attention in 
recent years has been focused on networks that have 
power-law degree distributions of the form pk oc fc~" 
for some constant exponent a [H, [H, [l3|. A number 
of networks appear to follow this pattern, at least ap- 
proximately, including the world wide web, the Internet, 
citation networks, and some social and biological net- 
works ^ . The observed value of the exponent typically 
lies in the range 2 < a < 3. Equivalently, we could 
say that the excess degree distribution — which ap- 
pears in the fundamental formula p3|) via its generating 
function — follows a power law with exponent a — 1. 

In fact, in essentially all cases, the observed power law 
holds only in the tail of the distribution; the distribution 
follows some other law for small degrees. This leaves us 
considerable latitude about the distribution we use in our 
calculations. Here we use a so-called Yule distribution 
for qk, with a typical real- world value of a = 2.5 for the 
exponent: 
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where r(2;) is the standard gamma function and C is 
again a normalizing constant. It is straightforward to 
show (by Stirling's approximation) that this distribution 
asymptotically follows a power law qk ~ fc~^/^, which 
corresponds to a raw degree distribution pk ~ /c"^/^. The 
Yule distribution appears in a number of contexts in the 
study of networks, particularly in the solutions of pref- 
erential attachment models that may explain the origin 
of power laws in some networks [Tsl . [l^ , and is consid- 
ered by some to be the most natural choice of power-law 
form for discrete distributions. Employing this particular 
choice for our configuration model gives 
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FIG. 1: The distribution of component sizes in random graphs 
with Poisson (c = 1-5), exponential (A = 1), and power-law 
(a = 2.5) degree distributions. Solid lines indicate the ex- 
act solutions derived in this paper. Points are the results of 
computer simulations for the same degree distributions. Each 
point is an average over 5000 networks of 10® vertices each. 
Error bars have been omitted, but are smaller than the data 
points in each case. 



which in turn gives 
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Setting n — s — 2 and substituting into Eq. (|13p . we can 
complete the remaining sum to get 
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In Fig. [T] we show the form of this distribution, along 
with those for the Poisson and exponential networks, 
Eqs. and Also shown in the figure are nu- 

merical results for the distributions of component sizes 
measured on computer generated networks with the same 
degree distributions. As the figure shows, there is excel- 
lent agreement between the simulations and the exact 
calculations. 

As with the exponential network, we can study the 
asymptotic form of the component size distribution (j23|) 
for the power-law network by making use of Stirling's 
approximation. We find that in the limit of large s, tTs ^ 
s^e-"', where f = 51n2-31n3 ~ 0.1699. . . Thus again 
we have an exponential tail to the distribution. 

This last result is at first slightly surprising. One might 
imagine that the component size distribution should it- 
self fall off as a power law or slower because the degree 
of a vertex provides a lower bound on the size of the 
component to which the vertex belongs — the fraction of 



vertices in components of size s or greater must be at 
least as large as the fraction of vertices of degree s or 
greater and hence the cumulative distribution of com- 
ponents falls off as slow or slower than the cumulative 
distribution of degrees. 

So how is it possible that we have an exponential dis- 
tribution of component sizes in the present case? The 
answer is that we are studying a network that has a giant 
component. Vertices not in the giant component — which 
make up almost all of the component size distribution — 
have a different degree distribution from the graph as 
a whole because the probability of not being in the gi- 
ant cornponent dwindles exponentially with increasing 
degree Q. This creates an exponential cutoff for the 
degree distribution, and hence we are back to the situa- 
tion we had for the exponential network, which gave an 
exponential component size distribution. 

Thus in a power-law network we expect tt^ to have 
an exponential tail whenever there is a giant component 
in the network, but a power-law tail when there is no 
giant component. This contrasts with the case for essen- 
tially every other degree distribution, where we expect a 
power-law distribution of component sizes only precisely 
at the phase transition where the giant component forms; 
everywhere else we expect the distribution to fall off ex- 
ponentially or faster !6||. 

The methods described here can be extended to the 
calculation of cluster sizes for percolation processes on 
networks also. Of particular interest is the bond perco- 
lation process, whose cluster sizes give the distribution 
of outbreaks for a standard SIR epidemiological process 
on the same network 0, H^]. Bond percolation can be 
framed in the same language as the calculation of com- 
ponent sizes above by considering the network formed by 
just the occupied edges. If the occupation probability 
is 0, then it is straightforward to show Q that the gener- 
ating functions for the degree distribution and excess de- 
gree distribution of this latter network are gQ{l — (j) -\- <j)z) 
and gi{\ — (j) + 4>z), with and gi defined as before. 
Substituting into Eq. (jl3[) . we then find 
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This result immediately implies that for all (/) < 1 the 
distribution of cluster sizes falls off at least exponentially 
with increasing s. Thus, in the language of epidemiology, 
we will never see a power-law distribution of outbreak 
sizes, even if the network has a power-law degree distri- 
bution. This is, overall, good news: it implies that there 
will be no fat tail to the outbreak distribution and hence 
no unexpectedly large outbreaks, regardless of whether 
the network has a giant component. 

To conclude, we have given an exact solution for the 
distribution of component sizes in random graphs with 
arbitrary degree distributions and applied it to networks 
with Poisson, exponential, and power-law distributed de- 
grees. In the latter case we find that though the network 
has a power-law distribution of component sizes when 
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there is no giant component, the distribution develops 
an exponential tail once a giant component appears. We 
have also applied our methods to bond percolation on 
networks, finding that percolation clusters always have 
an exponential tail to their distribution whenever the 



bond occupation probability is less than one. 

The author thanks Cris Moore for useful conversations. 
This work was funded in part by the National Science 
Foundation under grant DMS-0405348 and by the Santa 
Fe Institute. 



[1] S. N. Dorogovtscv and J. F. F. Mendes, Evolution of 
networks. Advances in Physics 51, 1079-1187 (2002). 

[2] M. E. J. Newman, The structure and function of complex 
networks. SIAM Review 45, 167-256 (2003). 

[3] M. E. J. Newman, A.-L. Barabasi, and D. J. Watts, The 
Structure and Dynamics of Networks. Princeton Univer- 
sity Press, Princeton (2006). 

[4] M. MoUoy and B. Reed, A critical point for random 
graphs with a given degree sequence. Random Structures 
and Algorithms 6, 161-179 (1995). 

[5] M. Molloy and B. Reed, The size of the giant component 
of a random graph with a given degree sequence. Combi- 
natorics, Probability and Computing 7, 295-305 (1998). 

[6] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Ran- 
dom graphs with arbitrary degree distributions and their 
applications. Phys. Rev. E 64, 026118 (2001). 

[7] P. Grassberger, On the critical behavior of the gen- 
eral epidemic process and dynamical percolation. Math. 
Biosci. 63, 157-172 (1982). 

[8] M. E. J. Newman, Spread of epidemic disease on net- 
works. Phys. Rev. E 66, 016128 (2002). 

[9] M. Abramowitz and I. A. Stegun (eds.). Handbook of 
Mathematical Functions. Dover Publishing, New York 
(1974). 

[10] P. Erdos and A. Rcnyi, On random graphs. Publicationes 

Mathematicae 6, 290-297 (1959). 
[11] P. Erdos and A. Renyi, On the evolution of random 

graphs. Publications of the Mathematical Institute of the 

Hungarian Academy of Sciences 5, 17-61 (1960). 
[12] B. Bollobas, Random Graphs. Academic Press, New 

York, 2nd edition (2001). 



[13] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. 
Stanley, Classes of small-world networks. Proc. Natl. 
Acad. Sci. USA 97, 11149-11152 (2000). 

[14] J. A. Dunne, R. J. Wilhams, and N. D. Martinez, Food- 
web structure and network theory: The role of con- 
ncctance and size. Proc. Natl. Acad. Set. USA 99, 12917- 
12922 (2002). 

[15] R. Albert, H. Jeong, and A.-L. Barabasi, Diameter of the 
world-wide web. Nature 401, 130-131 (1999). 

[16] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power- 
law relationships of the internet topology. Computer 
Communications Review 29, 251-262 (1999). 

[17] J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Ra- 
jagopalan, and A. Tomkins, The Web as a graph: Mea- 
surements, models and methods. In T. Asano, H. Imai, 
D. T. Lee, S.-I. Nakano, and T. Tokuyama (eds.). Pro- 
ceedings of the 5th Annual International Conference on 
Combinatorics and Computing, number 1627 in Lecture 
Notes in Computer Science, pp. 1-18, Springer, Berlin 
(1999). 

[18] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, 

Structure of growing networks with preferential linking. 

Phys. Rev. Lett. 85, 4633-4636 (2000). 
[19] P. L. Krapivsky, S. Redner, and F. Lcyvraz, Connectivity 

of growing random networks. Phys. Rev. Lett. 85, 4629- 

4632 (2000). 

[20] D. Mollison, Spatial contact models for ecological and 
epidemic spread. Journal of the Royal Statistical Society 
B 39, 283-326 (1977). 



